A Memory-Based Approach to Anti-Spam Filtering

نویسندگان

  • Georgios Sakkis
  • Ion Androutsopoulos
  • Georgios Paliouras
  • Vangelis Karkaletsis
  • Constantine D. Spyropoulos
  • Panagiotis Stamatopoulos
چکیده

This paper presents an extensive empirical evaluation of memory-based learning in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial e-mail, also known as “spam”, floods the mailboxes of users, causing frustration, wasting bandwidth and money, and exposing minors to unsuitable content. Using a recently introduced publicly available corpus, a thorough investigation of the effectiveness of a memory-based anti-spam filter is performed, including different attribute and distance weighting schemes, and studies on the effect of the neighborhood size, the size of the attribute set, and the size of the training corpus. Three different cost scenarios are identified, and suitable cost-sensitive evaluation functions are employed. We conclude that memory-based anti-spam filtering is practically feasible, especially when combined with additional safety nets. Compared to a previously tested Naïve Bays filter, the memory-based filter performs on average better, particularly when the misclassification cost for non-spam messages is high.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

We investigate the performance of two machine learning algorithms in the context of antispam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an ...

متن کامل

SPAM -- Technological and Legal Aspects

In this paper an attempt is made to review technological, economical and legal aspects of the spam in detail. The technical details will include different techniques of spam control e.g., filtering techniques, Genetic Algorithm, Memory Based Classifier, Support Vector Machine Method, etc. The economic aspect includes Shaping/Rate Throttling Approach/Economic Filtering and Pricing/Payment based ...

متن کامل

A Case-Based Approach to Spam Filtering that Can Track Concept Drift

There are a few key benefits of a case-based approach to spam filtering. First, the many different sub-types of spam suggest that a local learner, such as Case-Based Reasoning (CBR) will perform well. Second, the lazy approach to learning in CBR allows for easy updating as new types of spam arrive. Third, the case-based approach to spam filtering allows for the sharing of cases and thus a shari...

متن کامل

Survey on Spam Filtering Techniques

In the recent years spam became as a big problem of Internet and electronic communication. There developed a lot of techniques to fight them. In this paper the overview of existing e-mail spam filtering methods is given. The classification, evaluation, and comparison of traditional and learning-based methods are provided. Some personal anti-spam products are tested and compared. The statement f...

متن کامل

Evolutionary Symbiotic Feature Selection for Email Spam Detection

This work presents a symbiotic filtering approach enabling the exchange of relevant word features among different users in order to improve local anti-spam filters. The local spam filtering is based on a ContentBased Filtering strategy, where word frequencies are fed into a Naive Bayes learner. Several Evolutionary Algorithms are explored for feature selection, including the proposed symbiotic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001